Reinforcement learning holds the promise of enabling autonomous robots tolearn large repertoires of behavioral skills with minimal human intervention.However, robotic applications of reinforcement learning often compromise theautonomy of the learning process in favor of achieving training times that arepractical for real physical systems. This typically involves introducinghand-engineered policy representations and human-supplied demonstrations. Deepreinforcement learning alleviates this limitation by training general-purposeneural network policies, but applications of direct deep reinforcement learningalgorithms have so far been restricted to simulated settings and relativelysimple tasks, due to their apparent high sample complexity. In this paper, wedemonstrate that a recent deep reinforcement learning algorithm based onoff-policy training of deep Q-functions can scale to complex 3D manipulationtasks and can learn deep neural network policies efficiently enough to train onreal physical robots. We demonstrate that the training times can be furtherreduced by parallelizing the algorithm across multiple robots which pool theirpolicy updates asynchronously. Our experimental evaluation shows that ourmethod can learn a variety of 3D manipulation skills in simulation and acomplex door opening skill on real robots without any prior demonstrations ormanually designed representations.
展开▼